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(57) Abstract 

A method and structure for using a DRAM memory 
array (213) as a second level cache memory in a computer 
system (200). The computer system includes a central 
processing unit (CPU) (201), a first level SRAM cache 
memory (202), a CPU bus (204). and a second level 
cache memory (213) which includes a DRAM array (317) 
coupled to the CPU bus. In one embodiment, the DRAM 
array is operated at a higher frequency than the CPU bus 
clock signal. In another embodiment, a widened data 
path is provided to the DRAM array. Both embodiments 
effectively increase the data rate of the DRAM array, 
thereby providing additional time for precharging the 
DRAM array. As a result the precharging of the DRAM 
array is transparent to the CPU bus. 
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METHOD AND STRUCTURE FOR UTILIZING A DRAM 
ARRAY AS SECOND LEVEL CACHE MEMORY 

5 

BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention relates to a method and 
10 structure for implementing a memory system. More 

specif ically, the invention relates to a second level 
cache memory. 

Description of the Prior Art 

15 High-speed computer systems frequently use fast, 

small-capacity cache (buffer) memory to transmit 
signals between a fast processor and a slow (and low 
cost) , large-capacity main memory* Cache memory is 
typically used to temporarily store data which has a 

20 high probability of being selected next by the 

processor. By storing this high probability data in a 
fast cache memory, the average speed of data access for 
the computer system is increased. Thus, cache memory 
is a cost effective way to boost system performance (as 

25 compared to using all high speed, expensive memories) . 
In more advanced computer systems, there are multiple 
levels (usually two levels) of cache memory. The first 
level cache memory, typically having a storage of 4 
Kbytes to 32 Kbytes, is ultra-fast and is usually 

3 0 integrated on the same chip with the processor. The 
first level cache is faster because it is integrated 
with the processor and therefore avoids any delay 
associated with transmitting signals to and receiving 
signals from an external chip. The second level cache 

35 is usually located on a different chip than the 
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processor, and has a larger capacity, usually from 64 
Kbytes to 102 4 Kbytes. 

Fig. 1 is a block diagram of a prior art computer 
system 100 using an SRAM second level cache 
5 configuration. The CPU or microprocessor 101 

incorporates on-chip SRAM first level cache 102 to 
support the very fast internal CPU operations 
(typically from 33 Mhz to 150 Mhz) . 

First level cache 102 typically has a capacity of 

10 4 Kbytes to 3 2 Kbytes and performs very high speed data 
and instruction accesses (typically with 5 to 15 ns) . 
For first-level cache miss or other non-cacheable 
memory accesses , the memory read and write operations 
must go off -chip through the much slower external CPU 

15 bus 104 (typically from 25 Mhz to 60 Mhz) to the SRAM 
second level (L2) cache 106 (typically with 128 Kbytes 
to 1024 Kbytes capacity) with the additional latency 
(access time) penalty of round-trip off-chip delay. 

The need for CPU 101 to manage the delay penalty 

20 of off-chip operation dictates that in almost all 

modern microprocessors, the fastest access cycle (read 
or write) through the CPU bus 104 is 2-1-1-1. That is, 
the first external access will consume at least 2 clock 
cycles, and each subsequent external access will 

2 5 consume a single clock cycle. At higher CPU bus 

frequencies, the fastest first external access may take 
3 or more clock cycles. A burst cycle having 4 
accesses is mentioned here for purposes of illustration 
only. Some processors allow shorter (e.g., 2) or 
30 longer (e.g., 8 or more) burst cycles. Pipelined 

operation, where the parameters of the first external 
access of the second burst cycle are latched into CPU 
bus devices while the first burst cycle is still in 
progress, may hide the longer access latency for the 

3 5 first external access of the second burst cycle. Thus, 
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the first and second access cycles may be 2-1-1-1, l-l- 
1-1 , respectively . 

The cache tag memory 108 is usually relative small 
(from 8 Kbytes to 32 Kbytes) and fast (typically from 
5 10 to 15 ns) and is implemented using SRAM cells. 

Cache tag memory 108 stores the addresses of the cache 
lines of second level cache 106 and compares these 
addresses with an access address on CPU bus 104 to 
determine if a cache hit has occurred. This small 
10 cache tag memory 108 can be integrated with the system 
logic controller chip 110 for better speed and lower 
cost. An integrated cache tag memory operates in the 
same manner as an external cache tag memory. Intel f s 
8 243 0 PCI set for the Pentium processor is one example 
15 of a logic controller chip 110 which utilizes an SRAM 
integrated cache tag memory. 

One reason for the slower operating frequency of 
CPU bus 104 is the significant loading caused by the 
devices attached to CPU bus 104. Second level (L2) 
2 0 SRAM cache memory 106 provides loading on the data and 
address buses (through latch 112) of CPU bus 104. 
Cache tag memory 108 provides loading on the address 
bus, system logic controller chip 11 0 provides loading 
on the control, data and address buses, and main memory 
25 DRAM 114 provides loading on the data bus (through, 
latch 116) . 

In prior art computer system 100 , the system logic 
chip 110 provides an interface to a system (local) bus 
118 having a typical operating frequency of 25 Mhz to 

30 33 Mhz. System bus 118 may be attached to a variety of 
relatively fast devices 120 (such as graphics, video, 
communication, or fast disk drive subsystems). System 
bus 118 can also be connected to a bridge or buffer 
device 122 for connecting to a general purpose (slower) 

35 extension bus 12 4 (at 4 Mhz to 16 Mhz operating 
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frequency) that may have many peripheral devices (not 
shown) attached to it. 

Traditional high speed cache systems, whether 
first level or second level, are implemented using 
5 static random access memories (SRAMs) because the SRAMs 
are fast (with access times ranging from 7 to 25 
nanoseconds (ns) and cycle times equal to access 
times) • SRAMs are suitable for storing and retrieving 
data from high-speed microprocessors having bus speeds 

10 of 25 to 100 megahertz. Traditional dynamic random 

access memories (DRAMs) , are less expensive than SRAMs 
on a per bit basis because DRAM has a much smaller cell 
size. For example, a DRAM cell is typically one 
quarter of the size of an SRAM cell using comparable 

15 lithography rules- DRAMs are generally not considered 
to be suitable for high speed operation because DRAM 
accesses inherently require a two-step process having 
access times ranging from 50 to 120 ns and cycle times 
ranging from 90 to 2 00 ns. 

2 0 Access speed is a relative measurement. That is, 

while DRAMs are slower than SRAMs, they are much faster 
than other earlier-era memory devices such as ferrite 
core and charge-coupled devices (CCD) . As a result, 
DRAM could theoretically be used as a "cache" memory in 

25 systems which use these slower memory devices as a 

"main memory." The operation modes and access methods, 
however, are different from the operation modes and 
access methods disclosed herein. 

In most computer systems, the second level cache 

30 operates in a fixed and rigid mode. That is, any read 
or write access to the second level cache is of a few 
constant sizes (line sizes of the first and second 
level caches) and is usually in a burst sequence of 4 
or 8 words (i.e., consecutive reads or writes of 4 or 8 

35 words) or in a single access (i.e., one word). These 
types of accesses allow standard SRAMs to be modified 
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to allow -these SRAMs to meet the timing requirements of 
very high speed processor buses. One such example is 
the burst or synchronous SRAM, which incorporates an 
internal counter and a memory clock to increment an 
5 initial access address. External addresses are not 
required after the first access, thereby allowing the 
SRAM to operate faster after the first access is 
performed. The synchronous SRAM may also have special 
logic to provide preset address sequences , such as 

10 Intel's interleaved address sequence. Such performance 
enhancement, however', does not reduce the cost of using 
SRAM cells to store memory bits. 

Synchronous DRAMs (SDRAM) have adopted similar 
burst-mode operation. Video RAMs (VRAM) have adopted 

15 the serial port operation of dual-port DRAMs. These 

new DRAMs are still not suitable for second level cache 
operation, however, because their initial access time 
and random access cycle time remain much slower than 
necessary. 

20 It would therefore be desirable to have a 

structure and method which enables DRAM memory to be 
used as a second level cache memory. 

Prior art computer systems have also included 
multiple levels of SRAM cache memory integrated on the 

25 same chip as the CPU. For example, DEC'S Alpha 21164 
processor integrates 16 Kbytes of first level SRAM 
cache memory and 9 6 Kbytes of second level SRAM memory 
on the same chip. In such cases, a third level SRAM 
cache is typically used between the processor and a 

30 DRAM main memory. In such a computer system, it would 
be desirable to use a DRAM memory to replace the third 
level SRAM cache memory. 

Prior art high-performance second level SRAM cache 
memory devices generally conform to a set of pin and 

35 function specifications to assure that system logic 

controller 110 may operate compatibly with a variety of 
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different SRAM cache memories from multiple suppliers. 
Several examples of such pin and function 
specifications are set forth in the following 
references: "Pentium™ Processor 3.3V Pipelined BSRAM 
5 Specification", Version 1.2, Intel Corporation, October 
5, 1994; "32K x 32 CacheRAM™ Pipelined/Flow Through 
Outputs Burst Counter, & Self -Timed Write — For 
Pentium™ /PowerPC™ Processors", Advance Information 
IDT71V432, Integrated Device Technology, Inc., May 

10 1994; and "32K x 32 CacheRAM™ Burst Counter & Self- 
Timed Write — For the Pentium™ Processor", Preliminary 
IDT71420, Integrated Device Technology, Inc., May 1994. 

It is therefore desirable to have a method and 
structure which enables DRAM memory to be used as a 

15 second level cache memory which can be interfaced to a 
conventional logic controller which normally controls a 
second level SRAM cache memory. It is further 
desirable to have such a method and structure which 
requires minimal modification to the conventional logic 

2 0 control ler . 

SUMMARY OF THE INVENTION 

In accordance with the present invention, a 
structure and method for configuring a DRAM array, or a 
25 . plurality of DRAM arrays, as a second level cache 

memory is provided. A structure in accordance with the 
invention includes a computer system having a central 
processing unit (CPU) , a SRAM cache memory integrated 
with the CPU, a CPU bus coupled to the CPU, and a 

3 0 second level cache memory comprising a DRAM array 

coupled to the CPU bus. The second level cache memory 
is configured as stand alone memory in one embodiment. 
In another embodiment, the second level cache memory is 
configured and integrated with system logic on a 
35 monolithic integrated circuit (IC) . For high pin count 
microprocessors such as Intel 1 s Pentium, the companion 
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multiple chips (e.g., Intel's 82430 PCI set). In such 
a system, the second level cache DRAM array of the 
present invention may be integrated with one of the 
5 system logic chips, preferably the system logic chip(s) 
for the data path. In another configuration, the 
second level cache memory can be integrated with the 
CPU itself. 

When accessing the DRAM array of the present 

10 invention, row access and column decoding operations 
are performed in a self-timed asynchronous manner. 
Predetermined sequences of column select operations are 
then performed, wherein the column select operations 
are synchronous with respect to a clock signal. This 

15 asynchronous-synchronous accessing scheme reduces the 
access latency of the DRAM array. 

In one embodiment, the DRAM array is operated in a 
dual-edge transfer mode in response to the CPU bus 
clock signal. Consequently, the DRAM array performs 

20 access operations at a frequency which is twice as fast 
as the frequency of the CPU bus clock signal. DRAM 
access therefore occurs twice as fast as operations on 
the CPU bus. 

In another embodiment, the second level cache 

25 memory includes a phase locked loop (PLL) circuit 

coupled to the CPU bus. The PLL circuit generates a 
fast clock signal having a frequency greater than the 
frequency of a CPU bus clock signal. The fast clock 
signal is provided to the DRAM array to control read 

30 and write operations. In one embodiment, the fast 
clock signal has a frequency equal to twice the 
frequency of the CPU bus clock signal. Again, DRAM 
access occurs twice as fast as the operations on the 
CPU bus. 

3 5 In yet another embodiment, the second level cache 

memory includes a phase locked loop (PLL) circuit 

-7- 
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coupled to the CPU bus. The PLL circuit generates 
buffered clock signals at the same frequency as the CPU 
bus clock signal and may have various phase 
relationships with respect to the CPU bus clock signal. 
5 Data values can be read from the DRAM array to the 

CPU bus through a read first in first out (data buffer) 
memory having a data input port coupled to the DRAM 
array and a data output port coupled to the CPU bus. 
The data input port is clocked by the fast clock signal 

10 and the data output port is clocked by the CPU bus 
clock signal. Because data is read out of the DRAM 
array faster than the data is read out to the CPU bus, 
additional time is available during which the DRAM 
array can be precharged. The precharge time is thereby 

15 "hidden" from the CPU bus during a read operation from 
the second level cache memory. Alternatively, the 
width of the data input port between the DRAM array and 
the read data buffer can be widened, and the data input 
port can be clocked by a buffered version of the CPU 

2 0 bus clock signal. This alternative also provides a 
faster internal data transfer rate between the DRAM 
array and the read data buffer, thereby providing 
additional time in which the DRAM array can be 
precharged. 

2 5 Data values can also be written from the CPU bus 

to the DRAM array through a write data buffer memory 
having a data output port coupled to the DRAM array and 
a data input port coupled to the CPU bus. The output 
port of thei write data buffer memory is clocked by the 

30 fast clock signal and the input port of the write data 
buffer memory is clocked by the CPU bus clock signal. 
A first set of data values is written and stored in the 
write data buffer memory until a second set of data 
values is written to the write data buffer memory. At 

35 this time, the first set of data values is written to 
the DRAM array at the frequency of the fast clock 
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signal. Because the first set of data values is 
written to the DRAM array faster than the second set of 
data values is written to the write data buffer memory, 
a DRAM precharge operation can be performed during the 
5 tijne the second set of data values is written to the 
write data buffer memory. Therefore, the DRAM 
precharge operation is effectively "hidden" from the 
CPU bus during a write operation to the second level 
cache memory. Alternatively, the width of the data 

10 output port between the write data buffer memory and 

the DRAM array can be widened, and the data output port 
can be clocked by a buffered version of the CPU bus 
clock signal. This alternative also provides a faster 
internal data transfer rate between the write data 

15 buffer memory and the DRAM array, thereby providing 
additional time in which the DRAM array can be 
precharged. 

By operating the DRAM array with a faster clock 
signal or a wider data path than the CPU bus, a DRAM 

20 memory array can be used to satisfy the speed and 
operational requirements of a second level cache 
memory. Such a DRAM memory array can be used at a 
lower cost, typically 75% less, than traditional SRAM 
implementations . 

25 In another embodiment, data values to and from the 

DRAM array are routed through a sense amplifier 
circuit, a data amplifier circuit and a column selector 
coupled between the sense amplifier circuit and the 
data amplifier circuit. Writing data values to the 

3 0 DRAM array then involves the steps of (l) opening the 
column selector to isolate the data amplifier circuit 
from the sense amplifier circuit, (2) writing the data 
values from the write data buffer memory to the data 
amplifier circuit substantially in parallel with 

3 5 performing a row access operation in the DRAM array, 

and (3) closing the column selector to connect the data 
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amplifier circuit to the sense amplifier circuit, 
thereby causing the data values to be provided to the 
DRAM array through the sense amplifier circuit. By 
writing data values to the write data buffer memory in 
5 parallel with the row access operation, more time is 
available to precharge the DRAM array. 

The column selector can also be used during a DRAM 
read operation to provide additional time for a DRAM 
precharge operation. To do this, data values are read 

10 from the DRAM array to the sense amplifier circuit. 

The column selector 'is then closed to connect the sense 
amplifier circuit to the data amplifier circuit. After 
the data values have been written to the data amplifier 
circuit, the column selector is opened, thereby 

15 isolating the sense amplifier circuit from the data 
amplifier circuit. The data values can then be read 
out of the data amplifiers while the DRAM array is 
being precharged. 

The DRAM cache memory of the present invention 

20 operates on a transaction by transaction basis. A 

transaction is defined as a complete read or write data 
access cycle for a given address. A transaction can 
involve the transfer of a single data value, or the 
burst transfer of 4 data values. A burst transfer can 

25 transfer the data values on consecutive clock cycles, 
every other clock cycle, every third clock cycle, etc. 
A transaction in the DRAM cache memory must be executed 
as either a read or a write transaction, but cannot be 
both. That is, the DRAM cache memory transaction can 

30 not include partial read and partial write 

transactions, or change from a read transaction into a 
write transaction before the data transfer begins. In 
contrast, in standard SRAM, Burst SRAM (BSRAM) or 
Pipelined Burst SRAM (PBSRAM) memories, a transaction 

35 can start as either a read or a write and change into 
write or read on a clock by clock basis. This is 
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because SRAM accesses, whether with or without input 
registers or output registers, are directly from and to 
the memory cell array and the read or write operation 
can be applied to the memory cells directly. 
5 The transaction-based configuration of the DRAM 

cache memory of the present invention utilizes control 
signals to prevent any incorrect or delayed internal 
operations which might otherwise occur due to the 
internal two-step access (RAS and CAS) of the DRAM 

10 cache memory and the write data buffer used to buffer 

the write operation. * In a preferred embodiment, a CPU- 
initiated address strobe input signal (ADSP#) and a 
controller-initiated address strobe input signal 
(ADSC#) are used to indicate the start of new 

15 transactions in a manner compatible with standard 

PBSRAM. A byte write enable input signal (BWE#) and a 
global write input signal (GW#) are used as write 
control signals in a manner compatible with standard 
PBSRAM. An additional W/R# input signal (which is 

20 typically driven by the CPU) is incorporated to enable 
read and write transactions of the DRAM cache memory to 
be performed in a well-defined manner. 

The DRAM array, unlike the SRAM array, also 
requires periodic refresh operations to restore the 

2 5 charge in the cell capacitors to guarantee data 

integrity. To manage the internal refresh operation of 
the DRAM array without disrupting normal CPU and system 
controller operations, a handshake (Krdy) signal is 
required to communicate between the DRAM cache memory 

3 0 and the system controller, so that the latter may delay 

its own operation and operation of the CPU while the 
DRAM array is being refreshed. In a preferred 
embodiment, one signal pin of the DRAM array is used to 
carry the handshake signal. The single pin maintains 
35 maximum compatibility with standard PBSRAM system 
controllers . 
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In one embodiment, the falling edge of the Krdy 
signal indicates there is a pending refresh or other 
internal operation request, and the rising edge of the 
Krdy signal indicates the refresh or other internal 
5 operation has been completed. The polarity of the Krdy 
signal is chosen arbitrarily, and opposite polarity can 
be used to accomplish the same effect. Both the DRAM 
cache memory and the system controller sample the Krdy 
signal at least at the beginning of each new 

10 transaction, whether the transaction is initiated by 
the ADSP# or ADSC# signal. 

The Krdy signal can be used in different manners. 
In a preferred embodiment, the Krdy signal is 
implemented as an input/output signal. When multiple 

15 DRAM cache memory devices are used together for memory 
width or depth expansion or both, the Krdy signal can 
be used for synchronizing the DRAM refresh and/or 
internal operation among the multiple devices. 
Specifically, one of the DRAM cache memory devices is 

20 designated as a master device for refresh management. 
This master DRAM cache memory device uses the Krdy 
signal to communicate with the system controller and 
control the refresh management function. Each of the 
remaining DRAM cache memory devices share the Krdy 

25 signal line and are designated as slave devices. Each 
slave device samples the state of the Krdy signal to 
control its own refresh or internal operation as 
appropriate. 

In an alternative embodiment, the Krdy signal is 
30 driven by the system controller, and each DRAM cache 

memory, upon detecting a low Krdy signal, will initiate 
and complete a pre-defined refresh operation. 

The present invention will be more fully 
understood in light of the following detailed 
35 description taken together with the drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a prior art computer 
system having an SRAM second level cache memory; 

Fig. 2 is a block diagram of a computer system 
5 having a DRAM second level cache memory in accordance 
with the invention; 

Figs. 3(a) and 3(b) illustrate a schematic diagram 
and a timing diagram, respectively, of a self -timed 
RAS/CAS/burst accessing sequencer; 
10 Fig. 4 is a schematic diagram of a fast column 

accessing circuit; 

Fig. 5 is a schematic diagram of circuitry which 
provides for operation of a DRAM second level cache 
memory at twice the frequency of the CPU bus clock; 
15 Fig. 6 is a timing diagram of a 2-1-1-1 DRAM 

second level cache read operation ; 

Fig. 7 is a timing diagram of a 2-1-1-1 DRAM 
second level cache write operation; 

Fig- 8 is a timing diagram of a 2-1-1-1 DRAM 
20 second level cache read operation in accordance with an 
alternate embodiment of the invention ; 

Fig. 9 is a timing diagram of a 2-1-1-1 DRAM 
second level cache write operation in accordance with 
an alternate embodiment of the invention; 
25 Fig. 10 is a timing diagram of a 3-1-1-1 DRAM 

second level cache read operation in accordance with an 
embodiment of the invention; 

Fig. 11 is a timing diagram of a 3-1-1-1 DRAM 
second level cache write operation in accordance with 
3 0 an embodiment of the invention; 

Fig. 12 is a schematic diagram of a refresh 
management controller; 

Fig. 13 is a schematic diagram illustrating a DRAM 
second level cache in a typical system environment with 
35 key signal pins, in accordance with an embodiment of 
the invention; 
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Figs. 14a and 14b are timing diagrams of 
transaction-based DRAM second level cache read and 
write operations in accordance with an embodiment of 
the invention; and 
5 Fig. 15 is a timing diagram of a handshake 

protocol of a cache ready signal for a DRAM second 
level cache. 

DETAILED DESCRIPTI ON OF THE INVENTION 

10 Fig. 2 is a block diagram of computer system 2 00 

which includes second level DRAM cache memory 213 in 
accordance with the invention. In addition to second 
level DRAM cache memory 213, computer system 200 
includes CPU 201, first level SRAM cache 202, CPU bus 

15 204, latches 212 and 216, second level SRAM cache tag 
memory 208, system logic controller 211, main DRAM 
memory 214, system bus 218, bridge buffer circuit 222, 
system bus devices 220 and extension bus 224. 

Although DRAM cache memory 213 is referred to as a 

2 0 "second level" cache memory, it is understood that the 
present invention can also be applicable to other 
"levels" of cache memory higher than the second level 
(e.g., third level or fourth level). In general, the 
present invention is applicable to the "next level" 

25 cache memory, where the "next level" cache memory is 

defined as the cache memory which is connected between 
the processor and a large-capacity main memory (where 
the main memory is typically DRAM) . Using this 
definition, the term "second level cache memory" is 

30 interchangeable with the term "next level cache memory" 
in the following. 

CPU 201, which is typically fabricated on the same 
integrated circuit chip as first level SRAM cache 
memory 202, is coupled to the control, address and data 

35 lines of the CPU bus 204. Second level SRAM cache tag 
memory 208 receives address signals from the address 
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lines of CPU bus 204 through latch 212. System logic 
controller 211 (which controls second level DRAM cache 
memory 213 and main memory 214) is coupled to the 
control, address and data lines of CPU bus 2 04. In one 
5 embodiment, main DRAM memory 214 receives data signals 
from the data lines of CPU bus 204 through latch 216. 
In another embodiment, main DRAM memory 214 receives 
data signals from system logic controller 211 through 
alternate data path 226. System logic controller 211 

10 interacts with cache tag memory 208 and main memory 214 
in a conventional manner. In one embodiment, SRAM 
cache tag memory 208, latch 212, system logic 
controller 211 and second level cache memory 213 are 
fabricated on the same integrated circuit chip 210. In 

15 another embodiment, second level DRAM cache memory 213 
and system logic controller 211 are fabricated on 
separate chips. In yet another embodiment, second 
level DRAM cache memory 213 and system logic controller 
211 are fabricated on the same chip as CPU 201 and 

20 first level SRAM cache memory 202. 

Because the data paths from CPU bus 204, SRAM 
cache tag memory 208, system bus 218 and main memory 
214 feed into system logic controller 211, the system 
logic controller 211 can manage most of the data 

25 traffic locally without tying up CPU bus 204. For 
example, system logic controller 211 controls data 
traffic between main memory 214 and second level DRAM 
cache memory 213 or between system bus 218 and main 
memory 214. Consequently, the loading of CPU bus 204 

30 is lighter and the physical layout of CPU bus 204 is 

more compact, thereby allowing for faster operations on 
CPU bus 204. 

Embodiments of the present invention overcome many 
obstacles to successfully use second level DRAM cache 
35 memory 213 as a second level cache memory which matches 
the performance of an SRAM-cell based second level 
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cache. These obstacles include: (1) slow access 
latency, (2) precharge time, (3) refresh arbitration 
and control. These obstacles are overcome as described 
below. 

5 

(1) Slow Access Latency 

The access time of a DRAM cell array consists of a 
row access (RAS) time (i.e., the time to decode a row 
address, select a row word- line, and load a row of data 
10 bits from the DRAM cell array into sense amplifiers in 
the column area) and a column access (CAS) latency 
(i.e., the time to decode a column address, select a 
column, read the data from the sense amplifiers into 
data amplifiers, and then propagate the data signal to 
15 the chip input/output area) . The sum of the row and 
column access latencies is relatively long (4 5 to 60 
ns) compared to SRAM access latency (7 to 2 5 ns) . The 
DRAM access is longer because of the two-step 
sequential access, as well as the relatively long 
20 column decoding and access time (17 to 25 ns) . 

Some prior art DRAM devices such as pseudo-SRAMs 
and BiCMOS fast DRAM (e.g., Hitachi's 35 ns 1Mb BiCMOS 
DRAM with demultiplexed addressing) , use an SRAM-like 
interface in which the full row and column addresses 
25 are provided to the chip at one time and internal two- 
step access is performed in a fully asynchronous 
fashion. However, these pseudo-SRAM devices combine 
the RAS and CAS data operations, precharge time and the 
array refresh operation into each access cycle to 
30 emulate standard asynchronous SRAM operations. As a 
result, these pseudo-SRAM devices are very slow and 
not suitable for cache memory applications. 

The BiCMOS fast DRAM has a fast initial access 
time because demultiplexed addresses allow row and 
35 column addresses to be loaded at the beginning of each 
access cycle using separate address pins. However, the 
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BiCMOS fast DRAM still operates with the constraints of 
traditional asynchronous DRAM (i.e., long access cycle 
times and relatively slow subsequent accesses) .. 

In one embodiment of the invention, accesses to 
5 second level DRAM cache memory 213 are made faster by 
including a self-timed RAS/CAS/burst sequencer within 
second level DRAM cache memory 213. The burst 
sequencer merges asynchronous and synchronous 
operations of the DRAM accesses in a seamless fashion 

10 as described below* Fig. 3a is a block diagram of a 
self-timed RAS/CAS burst sequencer 300 in accordance 
with the invention. Burst sequencer 300 represents a 
portion of the accessing circuit included within second 
level DRAM cache memory 213. Burst sequencer 300 

15 includes control circuit 301, row address register 302, 
column address register 303, row decoder 304, row 
selector 305, sense amplifier control circuit 3 06, 
delay circuits 307-308, sense amplifier circuit 306, 
column decoder 310, column selector 311, data amplifier 

20 circuit 321 and burst sequence controller 313. Burst 

sequencer 3 00 is used to access an array of DRAM memory 
cells 317. 

Fig. 3b is a waveform diagram illustrating the 
operation of burst sequencer 3 00. To access second 

25 level DRAM cache memory 213, CPU 201 transmits a 

control signal through CPU bus 204 to second level DRAM 
cache 213. Second level DRAM cache 213 converts this 
control signal to an address strobe signal (See, e.g., 
timing control circuit 502, Fig. 5) which is provided 

30 to control circuit 301. CPU 201 also transmits an 

address through CPU bus 2 04 to second level DRAM cache 
memory 213. Second level DRAM cache 213 converts this 
control signal to row and column addresses (See, e.g., 
address buffer 503, Fig. 5) which are provided to row 

35 address register 302 and column address register 303, 
respectively. In response to the address strobe 
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signal, control circuit 301 generates a signal which 
causes the full row and column addresses (and bank 
address, not shown here, to simplify the schematic 
diagram) to be latched into registers 302 and 303, 
5 respectively • It is not necessary to simultaneously 
latch in the row and column addresses because the 
column address is not needed until the RAS operation is 
completed. As long as the column address is latched 
into register 303 before the completion of the RAS 

10 operation, there is no speed penalty. 

The row address stored in row address register 302 
and a row decode signal (Fig. 3b) generated by control 
circuit 301 are transmitted to row decoder 304. In 
response, row decoder 3 04 decodes the row address and 

15 transmits this decoded address to row selector 3 05. 

Row selector 305, turns on the appropriate word line 
(Fig- 3b) (i.e., performs a row selection operation) 
within DRAM array 317. 

A sense amplifier enable signal (Fig. 3b) is then 

2 0 generated by delay circuit 307 and transmitted to sense 
amplifier circuit 3 06. In response, the sense 
amplifiers in sense amplifier circuit 306 turn on to 
receive the data values of the selected row within DRAM 
array 317. The asynchronous delay introduced by delay 

25 circuit 307 is selected in view of the delays inherent 
in row decoder 304, row selector 305 and DRAM array 
317, such that sense amplifier circuit 306 is enabled 
as soon as the data values from the selected row of 
DRAM array 317 are available (i.e., as soon as the row 

30 access operation is completed) . Delay circuit 307 can 
be realized in a number of different ways, such as an 
inverter chain or an RC circuit. 

At the same time the row access operation is being 
performed, the column address can be provided from 

35 column address register 303 to column decoder 310, and 
column decoder 310 can perform the column decode 
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operation. At the same time, burst sequence controller 
313 can be set up to supply a special address 
scrambling sequence based on the initial column address 
received from column address register 303. For 
5 example. Intel's 486 and Pentium microprocessors 

modula-4 sequences are 0-1-2-3 , 1-0-3-2, 2-3-0-1 & 3-2- 
1-0. 

After sense amplifier circuit 306 is turned on and 
the column decode operation has been performed, the 

10 column decoder 310 receives a column decode enable 
signal from delay circuit 308. This column decode 
enable signal causes column decoder 310 to provide the 
decoded column address to the column selector 311. 
After the initial decoded column address is provided to 

15 column selector 311, burst sequence controller 313 

causes column decoder 310 to sequentially change the 
decoded column address provided to the column selector 
311 once during each half clock cycle. As a result, 
sequential data (with appropriate burst sequence 

20 scrambling determined by burst sequence controller 313) 
are read into data amplifiers 312 synchronous to the 
clock signal. In one embodiment, the clock signal is 
the CPU bus clock signal (i.e., a buffered copy of the 
clock signal provided by CPU bus 204). 

2 5 The clock signal is also provided to data 

amplifier circuit 312. Data is read from data 
amplifier circuit 312 to data line 330 at both the 
rising and falling edges of the clock signal (i.e., 
dual edge transfer) . In this specification, when an 

30 operation is said to occur "at a clock edge", it is 

understood that the operation occurs immediately after 
the occurrence of the clock edge. 

A burst mode write operation is very similar to 
the read operation, except that data is coming from the 
35 chip input/output circuitry and is synchronously loaded 
into data amplifiers 312 and through column selector 
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311 into the appropriate s nse amplifiers 306 with 
appropriat burst address sequence. The asynchronous 
self -timed RAS and CAS operations allow very tight 
access timing independent of clock frequency (that is, 
5 the RAS and CAS access time is constant and not a 

function of the clock frequency) while at the same time 
employing fully synchronous operation for the burst 
read/write operation that scales with clock frequency. 
Fig. 4 is a schematic diagram of a portion of 

10 column selector 311 and data amplifier circuit 312 of 

Fig. 3a. The following discussion describes the manner 
in which the circuitry of Fig. 4 provides for dual-edge 
data transfer. 

Column selector 311 includes tree decoders 311a 

15 and 311b. Tree decoders 311a and 311b are coupled to 
sense amplifier circuit 306 through a predetermined 
number (e.g., 32) of complementary signal lines. Tree 
decoders 311a and 311b are also coupled to column 
decoder 310. In the embodiment illustrated, column 

20 decoder 310 provides control signals Sa[7:0] and 

Sb[3:0], which cause tree decoders 311a and 311b to 
selectively couple one of the sense amplifiers in sense 
amplifier circuit 306 to data amplifier circuit 312. 
Data amplifier circuit 312 includes data 

25 amplifiers 312a and 312b, multiplexer 907, read data 

latch 914, write buffers 903 and 913, tri-state buffer 
905 and clock generation circuit 918. The circuitry 
illustrated in Fig. 4 services 64 of the sense 
amplifiers in sense amplifier circuit 306. The 

30 circuitry of Fig. 4 is repeated for each additional 64 
sense amplifiers in sense amplifier circuit 306. In 
one embodiment, the total number of data amplifiers in 
data amplifier circuit 312 is equal to the number of 
bits in each data word read from or written to DRAM 

35 array 317. 
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The complementary outputs of tree decoders 311a 
and 311b are provided to data amplifiers 312a and 312b, 
respectively. Data amplifiers 312a and 312b are 
regenerative latches which include cross-coupled 
5 transistors 970-973 and transistors 974-977. These 
regenerative latches are controlled by a locally 
generated, single phase clock signal D^. 

A local self-timed clock circuit 918 generates the 
control signals used to control data amplifiers 312a 

10 and 312b and multiplexer 907. A column precharge 
signal, PC, and the signal are generated in 

response to the clock signal, a column-access (CAS) 
signal and a bus precharge signal, WE (for write 
operation) . The CAS and WE signals are generated by 

15 control circuit 3 01. The clock signal is the same 

clock signal illustrated and described in connection 
with Figs. 3a and 3b. The PC and D,^ signals are local 
signals which are not used to drive any circuitry 
outside data amplifier pair 312a and 312b. Thus, 

20 timing skew in the control signals is minimized. 

Read operation 

To perform a read operation, the WE signal is de- 
asserted high. As a result, transistors 950-953 of 

25 write buffers 903 and 913 are turned off and tri-state 
buffer 905 is placed in a low impedance state. The CAS 
signal is asserted high. During a first half cycle of 
the clock signal, the clock signal is in a logic high 
state, thereby forcing both the D^ and PC signals to a 

3 0 logic high state. Under these conditions, the 

complementary outputs of tree decoders 311a and 311b 
are latched in data amplifiers 312a and 312b, 
respectively. 

For example, a logic low signal on lead 925 and a 

35 logic high signal on lead 926 cause transistors 971 and 
972 to turn on and transistors 970 and 973 to turn off. 
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The high signal causes transistor 961 to turn on. 

As a result, node 991 is pulled down to ground through 
transistors 972 and 961 and node 992 is pulled up to 
Vdd through transistor 971. In a similar manner, a 
5 logic low signal on lead 92 6 and logic high signal on 
lead 925 results in node 992 being pulled to ground 
through transistors 973 and 961 and node 991 being 
pulled to Vdd through transistor 970. 

Data amplifier 312b operates in the same manner as 

10 data amplifier 312a to latch the signals present on 

leads 927 and 928- Thus, a logic high signal on lead 
927 and logic low signal on lead 928 results in node 
993 being pulled up to Vdd through transistor 974 and 
node 994 being pulled down to ground through 

15 transistors 977 and 962- Similarly, a logic low signal 
on lead 927 and logic high signal on lead 928 results 
in node 993 being pulled to ground through transistor 
976 and 962 and node 994 being pulled to Vdd through 
transistor 975. 

20 Within multiplexer 907, the high D,^ signal causes 

transmission gates 995 and 997 to close (i.e., be 
placed in a conducting state) and transmission gate 996 
to open (i.e., be placed in a non-conducting state). 
As a result, the voltage on node 992 is transmitted 

25 through transmission gate 995 and tri-state buffer 905 
to data line 330. Data line 330 connects tri-state 
buffer 905 directly to the bus transceivers in the 
input/output circuit. This connection results in 
little loading other than the routing capacitance 

30 because there is no other signal multiplexed on this 
line. Loading of data line 330 is thus substantially 
smaller than that present in prior art schemes. 
Conseguently, the data lines of the present invention 
are capable of operating at much higher frequency (up 

35 to 250 Mhz) . 
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In addition, the voltage on node 933 is 
transmitted through transmission gate 997 and is stored 
in read data latch 914. 

During the second half cycle of the clock signal, 
5 the clock signal transitions low, thereby forcing both 
the and PC signals low. In response to the low 

signal, transistors 920-923 are turned on. As a 
result, leads 925-928 are coupled to Vdd (i.e., leads 
925-928 are precharged) . In addition, the low D,^ 

10 signal opens transmission gates 995 and 997 and closes 
transmission gate 996. As a result, the voltage stored 
in read data latch 914 is read out through transmission 
gate 996 and tri-state buffer 905 to data line 330 
during the second half cycle. In the foregoing manner, 

15 dual-edge transfer of data from sense amplifier circuit 
306 to data line 330 is facilitated. 

Write operation 

To perform a write operation, the WE signal is 

20 asserted low, thereby placing tri-state buffer 905 in a 
high-impedance state and applying a logic low signal to 
an input of each of NOR gates 954-957 in write buffers 
903 and 913. During a first half cycle of the clock 
signal, the clock signal is in a logic low state, 

25 thereby closing transmission gate 906 and opening 

transmission gate 916. The signal on the data line 330 
is therefore routed to an input of NOR gate 9 55. For 
example, a high signal on the data line 330 causes NOR 
gate 955 to provide a logic low signal to transistor 

3 0 951, thereby turning off this transistor. The low 

output of NOR gate 954 is also provided to an input of 
NOR gate 954, causing NOR gate 954 to output a logic 
high signal which turns on transistor 950. 

The low WE signal also causes the D^. and PC 

35 signals to go high, thereby turning off p-channel 

transistors 961-962. As a result, p-channel transistor 
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971 and n-channel transistor 972 are turned on. 
Consequently, tree decoder 311a receives supply voltage 
Vdd on lead 926 and the ground supply voltage on lead 
925, thereby writing a high data value to the selected 
5 column of sense amplifier. 

If the input from data line 330 is a logic low 
signal (as opposed to a logic high signal as previously 
discussed) , tree decoder 311a receives ground supply 
voltage on lead 926 and supply voltage Vdd on lead 925 

10 in a manner similar to that previously described. In 
this manner, data is written from data line 33 0 to the 
sense amplifiers during each half cycle of the clock 
signal. The demultiplexing performed by transmission 
gates 9 06 and 916 is necessary because the address 

15 selected by tree decoders 311a and 311b changes only 
once every clock cycle. 

Tree decoders 311a and 311b limit the multiplexing 
loading to approximately 12 lines (8+4) (as opposed to 
512 lines in a typical conventional scheme) . The 

20 decreased capacitive loading together with the higher 
drive signal provided by data amplifier circuit 
increase the data bandwidth. 

Delay matching 
25 Delay matching in the column circuitry is 

minimized by routing the lines carrying the clock 
signal, the pre-decoded column select signals Sa[7:0] 
and Sb(3:0], and the data signals in the same manner 
through the column area of the memory array. 

30 

Alternate Embodiment 

Fig- 5 illustrates an alternate embodiment/ in 
which the frequency of the clock signal is doubled and 
the clock generation circuit 918 (Fig. 4) is modified 
35 such that data values are read from (or written to) 

data amplifier circuit 312 at each rising (or falling) 
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edge of the clock signal. Thus, data values are 
transferred on data line 330 at the same rate as the 
previously described embodiment (i.e., at a frequency 
equal to twice the CPU bus clock frequency) . The clock 
5 generation circuit 918 can be modified by transmitting 
the doubled clock signal through a l-bit counter (not 
shown) before the clock signal is applied to clock 
generation circuit 918. Other modifications to clock 
generation circuit 918 to allow the circuitry of Fig. 4 

10 to operate in a single edge transfer mode with respect 
to the doubled clock'signal would be apparent to one of 
ordinary skill. 

Phase-locked loop (PLL) circuit 501 buffers the 
CPU bus clock signal (from CPU bus 204, Fig. 2) and 

15 generates a clock signal having the same frequency as 
the CPU bus clock signal (hereinafter referred to as a 
IX clock signal) and a clock signal having twice the 
frequency of the CPU bus clock signal (hereinafter 
referred to as a 2X clock signal) . The IX and 2X clock 

20 signals have fixed phase relationships with respect to 
the incoming CPU bus clock signal. These phase 
relationships are selected to provide data set-up and 
hold times appropriate for proper data transfer. 

Address buffer 503 latches the CPU bus address and 

2 5 decodes it into row and column addresses (and a bank 
address if there are multiple DRAM arrays, not shown 
here for purposes of simplicity) . Timing control 
circuit 502 derives an internal address strobe signal 
from the CPU bus address (received from address buffer 

30 503) and a control signal received from CPU bus 204. 

The address strobe, row address, column address and 2X 
clock signals are provided to burst sequencer 300 and 
DRAM array 317. Burst sequencer 300 and DRAM array 317 
operate substantially as previously described in 

35 response to these signals (See, Fig. 3a). 
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Address buffer 503 may have additional latches 
(pre-fetch buffers) to store the address of the next 
access operation while the current access is still in 
progress. The pre-fetch buffers enable pipelined 
5 operation so that back to back operations may be 
partially overlapped to reduce the latency cycles 
between operations . 

The remainder of the circuitry illustrated in Fig. 
5 is directed toward solving the problems introduced by 
10 the precharge time required for the DRAM array. 
Accordingly, this circuitry is discussed below. 

(2) Precharae Time 

The operation of a ORAM cell array requires that 

15 after a normal read or write access (RAS + CAS access) , 
the selected row be de-selected and the sense- 
amplifiers be turned off and equalized before any 
subsequent RAS operations are initiated. This 
operation is referred to as a precharge operation. The 

20 time period required to perform the precharge operation 
is referred to as the precharge (PRE) time. The PRE 
time is sufficiently long to fully equalize the sense 
amplifiers and the relatively high capacitance 
bitlines, so that the very small signal provided by the 

25 cell capacitor to the sense amplifier in connection 

with the next RAS operation can be read correctly and 
reliably. The PRE time requirement prevents DRAMs from 
executing back to back accesses which SRAMs can easily 
support. Thus, the access cycle time of DRAM is much 

3 0 longer (typically 1.5X to 2X) than its access latency, 
while SRAM*s access cycle time is approximately equal 
to its access latency. 

To be able to use DRAM with SRAM performance, the 
PRE time must be substantially "hidden" from the access 

35 operations of the CPU bus. Read data buffer 504, write 
data buffer 505 and write data buffer 506, illustrated 
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in Fig. 5, operate to allow the PRE time to be hidden 
from the access operations. 

As described in more detail below, the 2X clock 
signal is used to clock DRAM array 317, the data input 
5 terminal of read data buffer 504, and the data output 
terminal of write data buffer 506. The IX clock signal 
is used to clock the data output terminal of read data 
buffer 504 and the data input terminal of write data 
buffer 505. 

10 Data values are read from DRAM array 317 to CPU 

bus 204 through read data buffer 504. Data values are 
read into read data buffer 504 at the frequency of the 
2X clock signal. Data is then read out of read data 
buffer 504 at the frequency of the CPU bus clock 

15 signal. In this manner, read data buffer 504 performs 
clock resynchronization . 

Conversely, data values are written to DRAM array 
317 from CPU bus 204 through write data buffer 505 and 
write data buffer 506. Data values are read into write 

2 0 data buffer 505 at the frequency of the CPU clock 

signal and read out of write data buffer 506 at the 
frequency of the 2X clock signal. 

To minimize clock-to-data skew, DRAM array 317 can 
alternatively provide a 2X clock signal to read data 

2 5 buffer 504 along with the data in a source-synchronous 

fashion. The alternative 2X clock signal is a return 
clock signal which travels along a path which is 
selected such that the 2X clock signal exiting DRAM 
array 317 has a preselected delay and phase 
30 relationship with respect to the data values exiting 
DRAM array 317. 

Fig. 6 is a timing diagram for a 2-1-1-1 data read 
burst operation performed by the circuitry of Fig. 5. 
After the address strobe signal is asserted low, the 

3 5 RAS and CAS operations are initiated in a self -timed, 

asynchronous fashion (See, Figs. 3a and 3b) . Two 
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rising clock edges after the address strobe signal is 
asserted, the RAS and CAS operations are completed and 
a burst read operation is performed by DRAW array 317 
in a fully synchronous fashion with respect to the 2X 
5 clock signal. The read burst data from DRAM array 317 
is clocked into read data buffer 504 by the 2X clock 
signal. The read burst data is clocked out of read 
data buffer 504 to the CPU bus by the IX clock signal. 
As soon as the read data burst is completed, a 

10 precharge operation is initiated to DRAM array 317, 
thereby preparing DRAM array 317 for the next 
operation. This next operation can be either a normal 
back-to-back access or a pipelined access. Because the 
read data burst is written to read data buffer 504 by 

15 the 2X clock signal, there is time left to perform the 
precharge operation before the data is read out of read 
data buffer 504 by the IX clock signal. Thus, the 
precharge time is hidden from CPU bus 204. If the 
precharge time is short enough, DRAM array 317 may be 

20 ready for a subsequent operation at a time which would 
allow for pipelined operation. 

Fig. 7 is a waveform diagram illustrating the 
timing diagram for a 2-1-1-1 data write burst 
operation. Because the data lines of the CPU bus 204 

25 receive write burst data at a rate equal to the 
frequency of the CPU bus clock signal (i.e., the 
frequency of the IX clock signal,) a full write burst 
is not completed until the end of the 5th clock cycle. 
Thus, no time remains in a 2-1-1-1 data write burst 

30 operation to perform a precharge operation. 

Therefore, a separate write data buffer 505 (Fig. 
5) is used to latch in a first group of write burst 
data values (e.g., D1-D4) from CPU bus 204. When a 
second group of write burst data values (e.g., Dl , -D4 f ) 

35 arrives from CPU bus 204 (there may be multiple 

intervening read bursts) , the first group of write 
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burst data values D1-D4 is forwarded to DRAM array 317 
through write data buffer 506. The second group of 
write burst data values D1 , -D4 I is then stored in write 
data buffer 505. The address strobe signal initiates 
5 the RAS and CAS operations in a self -timed, 

asynchronous fashion (See, e.g., Figs. 3a and 3b). The 
first group of write burst data values D1-D4 is clocked 
from write data buffer 506 by the 2X clock signal in a 
fully synchronous fashion. A precharge operation is 

10 initiated after the data values D1-D4 are written to 
DRAM array 317. Because the write data burst is 
written to DRAM array 317 by the 2X clock signal, there 
is time left to perform the precharge operation before 
data values Dl 1 -D4 1 are written to write data buffer 

15 505 by the IX clock signal. Thus, the precharge time 
is hidden from CPU bus 204- Again, if the precharge 
time is short enough, DRAM array 317 may be ready for a 
subsequent operation at a time which would allow for 
pipelined operation. 

20 Fig. 5 also illustrates a data bypass path 510 

from write data buffer 506 to read data buffer 504. 
Data bypass path 510 allows for the special case where 
a CPU bus 204 requires access to a group of write burst 
data stored in write data buffer 505 or write data 

25 buffer 506, but not yet sent to DRAM array 317. In 
this case, the write burst data is transmitted from 
write data buffer 506 to read data buffer 504 at the 
same time that the write burst data is sent from write 
data buffer 506 to DRAM array 317. 

30 In alternate embodiments, additional write data 

buffers can be connected between write data buffer 505 
and CPU bus 204 to add depth to the multi-depth write 
data buffer created by write data buffer 505 and write 
data buffer 506. 

35 Older microprocessors (i.e., CPU's) may not 

support the write data burst access to second level 
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DRAM cache memory 213. In these microprocessors, 
single write accesses are individually transmitted. 
However, write data buffer 505 and write data buffer 
506 can still operate as previously described to "hide" 
5 the precharge time of DRAM array 317. 

In another embodiment, read data buffer 504, write 
data buffer 505 and write data buffer 506 can be used 
in conjunction with the circuitry illustrated in Figs. 
3a and 4 * As previously discussed, this circuitry 

10 causes data to be transferred on data line 3 30 at both 
the rising and falling edges of the IX clock signal 
(i.e., dual-edge transfer). When performing dual-edge 
transfer with the IX clock signal, PLL 501 is not 
necessary because the two edges of the incoming CPU bus 

15 clock signal provide the necessary timing references 
used for data transfer. In such an embodiment, the 
input port of read data buffer 504 and the output port 
of write data buffer 506 are modified such that they 
are clocked by both the rising and falling edges of the 

20 IX clock signal. 

In yet other embodiments, the burst methods 
previously described can be performed at other clock 
frequencies (e.g., 4X clock frequency), depending on 
the timing requirements of DRAM array 317. 

25 The precharge time of DRAM array 317 (Fig. 2) can 

alternatively be hidden using tree decoders 3lla and 
311b (Fig. 4) of column selector 311 (Fig. 3a) as 
isolation switches between sense amplifier circuit 306 
and data amplifier circuit 312. To electrically 

30 isolate data amplifier circuit 312 from sense amplifier 
circuit 306, a disconnect control signal is provided to 
column decoder 310. In response, column decoder 310 
disables all control signals Sa[7:0] and Sb[3:0] such 
that all of the switches in tree decoders 311a and 3llb 

35 (Fig. 4) are opened, thereby isolating data amplifiers 
312a and 312b from sense amplifiers SA[63:0] and 
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SA[63:0]. The switches of column selector 311 are 
opened as soon as the data amplifiers in data amplifier 
circuit 312 have settled. The precharge operation can 
be initiated as soon as data amplifier circuit 312 and 
5 sense amplifier circuit 306 are isolated. 

Fig. 8 illustrates the timing of a read burst 
operation in accordance with this embodiment of the 
invention. Isolation occurs after the RAS/CAS access 
operations are performed. Thus, the burst read and 

10 precharge operations can be performed simultaneously. 

Although Fig. 8 indicates that the burst read operation 
is performed at: the frequency of the 2X clock signal, 
the burst read operation can also be performed at the 
frequency of the IX clock signal because the burst read 

15 operation can be performed at the same time as the 
precharge operation. Performing the burst read 
operation at the frequency of the IX clock signal 
advantageously reduces read errors and power 
consumption. 

20 Once the data amplifier circuit 312 is 

disconnected from the sense amplifier circuit 306, data 
values can only be accessed from data amplifier circuit 
312. Consequently, to support a burst access, the 
number of data amplifiers in data amplifier circuit 312 

25 must be sufficient to store all of the data values 

required during the burst access. Thus, to support a 
burst access of 4 words, there must be enough data 
amplifiers in data amplifier circuit 312 to 
simultaneously store all of the bits which make up the 

30 4 words. In such an embodiment, multiple sense 

amplifier data values are read into multiple data 
amplifiers simultaneously by using multiple data 
amplifiers. This is in contrast to the previously 
described embodiments in which data amplifier circuit 

35 312 only needs to have a data amplifier for each of the 
bits in a single word. 
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In a variation of the embodiment which uses column 
selector 310 as an isolation switch, multiple DRAM 
arrays can be simultaneously accessed to provide a 
burst access. Thus, four DRAM arrays can be used to 
5 provide a burst access of 4 words. To accomplish this, 
a data word is simultaneously stored in the data 
amplifier circuit of each of the four DRAM arrays. The 
data amplifier circuits of these DRAM arrays are then 
disconnected from their associated sense amplifier 

10 circuits. The four words can then be read from the 
data amplifier circuits of the DRAM arrays in the 
desired order while the DRAM arrays are simultaneously 
being precharged . 

Fig. 9 illustrates the timing of a write burst 

15 operation in accordance with this embodiment of the 

invention. Write data buffer 505 stores a first group 
of write burst data values D1-D4 . Upon receiving a 
second group of write burst data values D1 , -D4 I , data 
amplifier circuit 312 is isolated from the sense 

20 amplifier circuit 306 and the first group of write 

burst data values D1-D4 is transmitted through write 
data buffer 506 to data amplifier circuit 312 at the 
frequency of the 2X clock signal. At the same time, 
the RAS/CAS access operations are performed. After the 

25 RAS/CAS access operations are complete, data amplifier 
circuit 312 is connected to sense amplifier circuit 
306, thereby providing the first group of write burst 
data values D1-D4 to sense amplifier circuit 306. The 
precharge operation is then initiated. Because the 

30 write data is burst at the frequency of the 2X clock 

signal, more time is provided to perform the precharge 
operation. As a result, the precharge operation can be 
performed before a subsequent write burst operation is 
to be performed with the second group of write burst 

35 data values D1 , -D4 I . 
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In yet another embodiment, the data path between 
DRAM array 317 and read data buffer 504 is widened. 
Fig- 10 is a timing diagram of a 3-1-1-1 DRAM second 
level cache read operation in accordance with this 
5 embodiment of the invention. In this embodiment, the 
IX clock signal (generated with or without PLL circuit 
501) is used to launch operations within DRAM array 
317. A double-wide internal data path, which 
simultaneously carries two data values, is provided 

10 between DRAM array 317 and read data buffer 504 , 

effectively doubling the data transfer rate between 
DRAM array 317 and read data buffer 504. Although Fig. 
10 illustrates a double-wide internal data path, data 
paths having other widths (e.g., triple-wide, quadruple 

15 wide, etc.) are possible and within the scope of the 
invention. The RAS and CAS operations are launched 
after the address strobe signal (indicating a new 
transaction) is asserted. As soon as the accessed data 
values (e-g., D1-D4) are read from DRAM array 317 into 

20 data amplifier circuit 312 (Fig. 3a) , column selector 
310 disconnects sense amplifier circuit 306 from data 
amplifier circuit 312 and the precharge operation is 
begun. This allows DRAM array 317 to operate with a 
minimum cycle time. 

2 5 The burst data values D1-D4 are transmitted over 

the internal data path at the rate of two data values 
for each cycle of the IX clock signal. Thus, data 
values Dl and D2 are transmitted during one clock 
cycle, and data values D3 and D4 are transmitted during 
30 the subsequent clock cycle. The data values stored in 
read data buffer 504 are transferred to CPU bus 204 at 
the normal data rate of one data value per cycle of the 
CPU bus clock signal* 

In another embodiment, read data buffer 504 is not 

3 5 used and the data values are transferred at the CPU bus 

data rate directly from DRAM array 317 to CPU bus 204. 
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All internal RAS/CAS and precharge operations remain as 
illustrated in Fig. 10. The data values are 
transmitted on a single width data path directly to CPU 
bus 204, at a rate of one data value per cycle of the 
5 IX clock signal. The external operation of DRAM second 
level cache memory 213 in this embodiment can still be 
compatible with standard PBSRAM. This mode of 
operation, however, requires the transfer of data 
values D3 and D4 from data amplifier circuit 312 to CPU 

10 bus 2 04 be performed in parallel with the RAS/CAS 
access of the next read operation (involving data 
values Dl'-D4'). Care must be taken to minimize the 
potential internal bus contention problem which may 
exist between data values D4 and Dl". 

15 The data path between DRAM array 317 and write 

data buffer 506 can also be widened. Fig. 11 is a 
timing diagram of a 3-1-1-1 DRAM second level cache 
write operation which utilizes a widened wide data path 
between DRAM array 317 and write data buffer 506. In 

20 this embodiment , the IX clock signal is used to launch 
internal operations of DRAM 317. The IX clock signal 
can be generated by PLL circuit 501 or by simple clock 
buffering. Although not required by the invention, 
Fig. 11 illustrates the internal operation of DRAM 

25 array 317 with a double-wide internal data path between 
write data buffer 506 and DRAM array 317. This double- 
wide data path effectively doubles the internal data 
transfer rate. The RAS and CAS operations are launched 
after the address strobe signal (indicating a new 

30 transaction) is asserted. A data burst write operation 
from write data buffer 506 of previously written data 
values Dl 0 -D4 0 is performed to data amplifier circuit 
312 in parallel with the RAS/CAS operation. After 
write data values Dl 0 -D4 (l have been transmitted from 

35 data amplifier circuit 312 to sense amplifier circuit 
306, column selector 310 disconnects sense amplifier 



-34- 



WO 96/16371 PCT/US95/14552 

circuit 306 from data amplifier circuit 312, After 
column selector 310 is disconnects these elements, a 
precharge operation can start immediately, thereby 
allowing DRAM array 317 to operate with minimum cycle 
5 time. Meanwhile, write data buffer 506 accepts new 

write data values D1-D4 from CPU bus 204 at the normal 
data rate as determined by the CPU bus clock signal. 

In an alternative embodiment, if a widened 
internal data path between write data buffer 506 and 

10 DRAM array 317 is not used, then the precharge 

operation will start two clocks cycles later than shown 
in Fig. 11, (i.e., after all of the previous write data 
values Dl 0 -D4 0 have arrived at DRAM array 317) . In this 
embodiment, the minimum cycle time will be two clock 

15 cycles longer than in the embodiment described in 

connection with Fig. 11. The external operation of 
second level DRAM cache memory 213 in this embodiment 
will be slower than standard PBSRAM. 

20 f3> Refresh Management and Arbitration 

Second level DRAM cache memory 213, with passive 
charge storage, requires refresh operations to 
periodically (typically every 4 to 64 ms) replenish the 
charges stored in each cell capacitor. This is because 

25 junction, transistor and dielectric leakage currents 
may cause the stored charge to leak out. Fig. 12 
illustrates one embodiment of a refresh management 
circuit 800 which can be used in connection with the 
present invention. Refresh management circuit 800 

30 consists of address buffer 801, refresh counters 802, 
in-progress pointer 803, comparator 804 , cache tag 
comparator 805 and CPU access delay circuit 806. 
Refresh management circuit 800 is used in connection 
with an embodiment which uses multiple DRAM arrays 

35 (similar to DRAM array 317) within second level DRAM 
cache 213. 
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R fresh counters 802 keep track of the addresses 
of the DRAM arrays and rows to be refreshed next. 
Refresh counters 802 periodically initiate a 
RAS / PRECHARG E operation to the DRAM arrays and rows 
5 indicated by the refresh counters 802 by transmitting 
signals selecting these arrays and rows to the 
appropriate DRAM array (s). in-progress pointer 803 
indicates the address of the DRAM array currently being 
refreshed. Each refresh operation typically lasts 40 

10 to 60 ns. 

Any CPU bus access request initiates a comparison 
to see if there is cache hit in the second level DRAM 
cache memory 213. At the same time, if there is a 
refresh operation in progress, the contents of in- 

15 progress pointer 803 (the DRAM array address for a 
single array refresh, or the high order bits of the 
DRAM array address for a group array refresh) are 
compared by comparator 8 04 to the address of the 
requested DRAM array. Any collision (match of array 

20 addresses) will cause CPU access delay circuit 806 to 
delay CPU access until the refresh operation is 
completed, in-progress pointer 803 is cleared, and the 
appropriate ready signal is sent from CPU access delay 
circuit 806 to the CPU bus. The delay of CPU access 

2 5 delay circuit 8 06 is set to a predetermined time based 
on the known timing of the refresh operation. By 
partitioning second level DRAM cache memory 213 into 
multiple banks, the probability of a collision during a 
refresh operation is proportionately reduced. 

30 

PBS RAM Compatible Embodiment 

Fig. 13 is schematic diagram illustrating a 
computer system 1300 which includes CPU 1301, CPU bus 
1304, second level cache tag memory 1308, system 
35 controller 1311, system bus 1318, second level DRAM 

cache memory 1313, main DRAM memory 1314 and data path 
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1326. Computer system 1300 is shown in a PBSRAM 
compatible environment with key signal pins 
illustrated. In this embodiment, second level DRAM 
cache memory 1313 includes two 32K x 32 ORAM memory 
5 arrays 134 0 and 1341 with pin-out and connections being 
compatible with standard PBSRAM. Standard PBSRAM 
utilizes the following signals: address input signals 
AD[18:3], bi-directional data signals D(63:0], CPU- 
initiated address strobe input signal ADSP#, clock 

10 input signal CLK, controller-initiated address strobe 
input signal ADSC#, counter advance input signal ADV# r 
output enable input signal OE#, various chip enable 
input signals CE1#, CE2 and CE3#, byte write enable 
input signal BWE#, global write input signal GW#, and 

15 individual byte write control input signals BW#[7:0]. 

These signals are described in more detail in "Pentium 7 * 
Processor 3.3V Pipelined BSRAM Specification", Version 
1.2, Intel Corporation, October 5, 1994. 

The present invention utilizes several signals in 

20 addition to those enumerated above. Thus, each of DRAM 
arrays 1340 and 1341 receives from CPU 1301 a 
write/read identification (W/R#) signal which defines 
the nature (i.e., read or write) of a CPU-initiated 
ADSP# signal. Each of DRAM cache arrays 1340 and 1341 

25 also provides and/or receives a refresh management 
(Krdy) signal to system controller 1311. The Krdy 
signal is used to control the management of refresh and 
internal operations of DRAM arrays 1340 and 1341. Each 
of DRAM cache arrays 1340 and 1341 also receives a 

3 0 Reset# signal from CPU 1301 for general initialization 
and synchronization during power up operations. 

Fig. 14a is a timing diagram of transaction-based 
DRAM second level cache burst read and write operations 
using the signals illustrated in Fig. 13. The timing 

35 diagram of Fig. 14a is compatible with the requirements 
of standard PBSRAM. The signal definitions and 
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operations for the ADSP#, ADSC# / ADV#, CLK, GW#, BWE#, 
BW#'s, CE#, and OE# signals are the same as those of 
PBSRAM. In a preferred embodiment, the W/R#, BWE# and 
GW# signals are used in conjunction with the ADSP# and 
5 ADSC# signals to uniquely define each transaction. 

When the ADSP# signal is asserted low at the start of 
an ADSP# initiated transaction (shown as Rl & W2 in 
Fig. 14a), the W/R# signal must be valid to indicate 
whether a read or write operation is to be performed. 

10 In Fig. 14a, a low W/R# signal indicates a read 

transaction and a high W/R# signal indicates a write 
transaction. In Fig. 14a f the chip enable (CE1#) 
signal must be initially low at the beginning of the Rl 
and W2 operations for these operations take place 

15 within DRAM arrays 1340 and 1341. 

when the ADSC# signal is asserted low at the start 
of an ADSC# initiated transaction (shown as W3 & R4 in 
Fig. 14a), the BWE# and GW# signals are used to 
indicate whether a read or write transaction is to be 

20 performed. If either the BWE# signal or the GW# signal 
(not shown) is low, a write transaction is performed. 
If neither the BWE# signal nor the GW# signal is low, a 
read transaction is performed. At the beginning of the 
W3 and R4 operations, the CE1# signal must be in a low 

25 state to cause the W3 and R4 operations to take place 
within DRAM arrays 1340 and 1341. The burst read and 
burst write operations illustrated in Fig. 14a are 
performed in accordance with one of the embodiments 
previously described in connection with Figs. 10-11. 

30 Fig. 14b is a timing diagram of transaction-based 

DRAM second level cache single read and write 
operations using the signals illustrated in Fig. 13. 
The timing of the signals in Fig. 14b is similar to the 
timing of the signals in Fig. 14a, except for the 

3 5 length of the data phase. 
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Fig, 15 is a timing diagram which illustrates the 
handshake protocol of the Krdy signal in computer 
system 1300 (Fig. 13) . The signals illustrated in Fig, 
15 are the same signals previously described in 
connection with Figs. 13 and 14a-14b, with the 
exception of the NA# and BRDY# signals. The NA# and 
BRDY# signals are generated by system controller 1311 
and sent directly to CPU 1301. The NA# signal 
indicates that system controller 1311 is ready for next 
address and the BRDY# signal indicates that data values 
are ready on CPU bus 1304. The Krdy signal is used to 
control the refresh management of second level ORAM 
cache memory 1313 . 

The falling edge of the Krdy signal indicates 
there is a pending refresh or other internal operation 
request, and the rising edge of the Krdy signal 
indicates the refresh or other internal operation has 
been completed. The polarity of the Krdy signal is 
chosen arbitrarily, and opposite polarity can be used 
to accomplish the same effect. Both DRAM cache memory 
1313 and system controller 1311 shall sample the Krdy 
signal at least at the beginning of each new 
transaction, whether the transaction is initiated by 
the ADSP# or ADSC# signal. 

In one embodiment, the handshake protocol of the 
Krdy signal is as follows. If the Krdy signal is high 
at the start of a new transaction, then this 
transaction will proceed to completion normally. 
However, if the Krdy signal is low at the start of a 
new ADSC# transaction, and the Krdy signal has just 
entered this low state (within the last clock cycle) , 
the ADSC# transaction will proceed to completion and be 
followed by a refresh operation. If the Krdy signal 
has been low for more than one clock cycle, the ADSC# 
transaction will be delayed until the Krdy signal goes 
high again. 
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If the Krdy signal is low at the start of a new 
ADSP# transaction, then the ADSP# transaction will be 
delayed until the Krdy signal goes high. 
Alternatively , the handshake protocol for ADSP# 
5 transactions can be defined in a similar manner as the 
handshake protocol for the ADSC# transactions. Thus, 
if the Krdy signal is low at the start of a new ADSP# 
transaction and the Krdy signal has just entered this 
low state (within the last clock cycle) , the ADSP# 
10 transaction will proceed to completion and be followed 
by a refresh operation. If the Krdy signal has been 
low for more than one clock cycle, the ADSP# 
transaction will be delayed until the Krdy signal goes 
high again. 

15 In another handshake protocol, system controller 

1311 will also sample the status of the Krdy signal 
when there is a pending ADSC# transaction. If the Krdy 
signal is low, then the ADSC# transaction will be 
delayed until the Krdy signal goes high. Otherwise, 

20 the pending ADSC# transaction is initiated. 

The Krdy signal can be used for multiple purposes. 
In another embodiment, the Krdy signal is implemented 
as an input/output signal. When multiple DRAM arrays 
(e.g., arrays 1340 and 1341 in Fig. 13) are used 

25 together for memory width or depth expansion or both, 
the Krdy signal can be used for synchronizing the DRAM 
refresh and/or internal operation among the multiple 
devices. For example, DRAM array 1340 can be 
designated as a master device for refresh management. 

30 This master DRAM array 1340 uses the Krdy signal to 

communicate with system controller 1311 and control the 
refresh management function. Each of the remaining 
DRAM cache memory devices (e.g. f DRAM array 1341) 
shares the Krdy signal line and are designated as slave 

35 devices. Each slave device samples the state of the 
Krdy signal to control or initiate its own refresh or 
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internal operation in a manner consistent with the 
master device, thereby maintaining synchronization with 
the master device. 

In a yet another embodiment, the Krdy signal is 
5 driven by system controller 1311, and each of DRAM 

arrays 1340 and 1341, upon detecting a low Krdy signal, 
will initiate and complete a pre-defined refresh 
operation. 

Turning now to the embodiment illustrated in Fig. 

10 15, after the ADSP# signal is asserted low to begin the 
W2 write transaction,* the Krdy signal is pulled down to 
indicate that there is a pending refresh event. Since 
the Krdy signal is high when the ADSP# signal is 
asserted low, the W2 burst write transaction (involving 

15 data values 2a-2d) is executed to completion in a 

normal manner. When the W2 burst write transaction is 
completed, a refresh operation is initiated. A read 
(R3) transaction is subsequently initiated when the 
ADSP# signal is asserted low. At this time, the Krdy 

20 signal is still low because the refresh operation has 
not been completed. The low Krdy signal causes the R3 
read transaction to be delayed until the refresh 
operation is completed. In this example, the R3 read 
transaction is delayed by one clock cycle. Even if the 

25 ADSP# signal for the R3 read transaction starts earlier 
than shown in Fig. 15, the R3 read transaction is 
delayed until the clock cycle shown in Fig. 15 (i.e., 
the operation is delayed until after Krdy returns 
high) . 

30 The handshake protocol of the Krdy signal can also 

be implemented in other manners. In one variation, the 
refresh-pending request is initiated from DRAM cache 
memory 1313 using the Krdy pin, and system controller 
1313 returns an acknowledgment signal on a separate pin 

35 to DRAM cache memory 1313 to instruct DRAM cache memory 
1313 to start the refresh operation. The Krdy signal 
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is then driven high by DRAM cache memory 1313 upon 
completion or pending completion of the refresh 
operation. This arrangement allows more flexibility in 
the design of system controller 1313 because controller 
5 1313 can control when the refresh operation actually 
begins. 

In another variation, DRAM cache memory 1313 
drives the Krdy signal pin to indicate a refresh- 
pending condition and system controller 1318 drives the 

0 Krdy signal pin to indicate a refresh-start condition. 
In this arrangement/ the number of clock cycles 
required to perform the refresh operation is fixed and 
both DRAM cache memory 1313 and system controller 1318 
have counters which allow these devices to track the 

5 refresh operation in a consistent fashion. 

Although the invention has been described in 
connection with several embodiments, it is understood 
that this invention is not limited to the embodiments 
disclosed, but is capable of various modifications 

D which would be apparent to one of ordinary skill in the 
art. Thus, the invention is limited only by the 
following claims. 
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WHAT 15 CLAIMED IS: 

1. A computer system comprising: 

a central processing unit (CPU) ; 
at least one static random access memory 
5 (SRAM) cache memory integrated with the CPU; 

a bus coupled to the CPU; and 

a next level cache memory coupled to the bus, 
wherein the next level cache memory comprises at 
least one dynamic random access memory (DRAM) 
10 array. 

2. The computer system of Claim 1, wherein the 
next level cache memory further comprises a burst 
sequence controller which accesses the DRAM array in a 

15 self-timed manner asynchronous with respect to a bus 
clock signal on the bus . 

3. The computer system of Claim 2, wherein the 
burst sequence controller includes circuitry to access 

2 0 the DRAM array at both edges of a clock signal derived 

from the bus clock signal. 

4. The computer system of Claim 1, wherein the 
next level cache memory further comprises an address 

25 buffer coupled to the bus, wherein the address buffer 
stores row and column addresses required for a 
subsequent access to the cache memory while a current 
access to the cache memory is in progress. 

3 0 5. The computer system of Claim 1, wherein the 

bus carries a bus clock signal , and the next level 
cache memory further comprises: 

a phase locked loop (PLL) circuit coupled to 
the bus, wherein the PLL circuit generates a DRAM 
3 5 clock signal having a frequency equal to or 

greater than a frequency of the bus clock signal, 
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wherein the DRAM clock signal is provided to the 
DRAM array to control read and write operations 
within the DRAM array. 

6. The computer system of Claim 1, wherein the 
bus carries a bus clock signal, and the next level 
cache memory further comprises a read buffer having a 
data input port coupled to the DRAM array and a data 
output port coupled to the bus, wherein the data input 
port is clocked by a DRAM clock signal and the data 
output port is clocked by the bus clock signal • 

7. The computer system of Claim 6, wherein the 
frequency of the DRAM clock signal is higher than the 

15 frequency of the bus clock signal. 

8. The computer system of Claim 1, wherein the 
bus carries a bus clock signal, and the next level 
cache memory further comprises: 

20 a read buffer having a data input port 

coupled to the DRAM array and a data output port 
coupled to the bus, wherein the data input port is 
clocked twice during each cycle of the bus clock 
signal and the data output port is clocked once 

25 during each cycle of the bus clock signal, 

9. The computer system of Claim 1, wherein the 
bus carries a bus clock signal, and the next level 
cache memory further comprises: 

a write buffer having a data output port 
coupled to the DRAM array and a data input port 
coupled to the bus, wherein the output port of the 
write buffer is clocked by a DRAM clock signal and 
the input port of the write buffer is clocked by 
the bus clock signal. 
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10. The computer system of Claim 9, wherein the 
frequency of the DRAM clock signal is higher than the 
frequency of the bus clock signal. 

5 11. The computer system of Claim 9, wherein the 

write buffer is a multi-depth first in, first out 
memory capable of storing at least one set of write 
burst data . 

10 2.2. The computer system of Claim 1, wherein the 

bus carries a bus clock signal, and the next level 
cache memory further comprises: 

a write buffer having a data output port 
coupled to the DRAM array and a data input port 
15 coupled to the bus, wherein the data input port is 

clocked once during each cycle of the bus clock 
signal and the data output port is clocked twice 
during each cycle of the bus clock signal. 

20 13 . The computer system of Claim 12, wherein the 

write buffer is a multi-depth first in, first out 
memory capable of storing at least one set of write 
burst data. 

25 14 . The computer system of Claim 1, wherein the 

DRAM array comprises a DRAM clock circuit path, wherein 
a DRAM clock signal is routed through the DRAM clock 
circuit path, whereby the DRAM clock signal is 
synchronized with data values written to or read from 

30 the DRAM array. 

15. The computer system of Claim 1, wherein the 
next level cache memory further comprises: 

a sense amplifier circuit coupled to the DRAM 

3 5 array; 
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a data amplifier circuit having a plurality 
of data amplifiers; and 

a column selector coupled between the sense 
amplifier circuit and the data amplifier circuit, 
5 wherein the column selector can be selected to 

isolate the sense amplifier circuit from the data 
amplifier circuit. 

16. The computer system of Claim 15, wherein the 
10 data amplifiers in the data amplifier circuit to 

simultaneously are able to store all of the data values 
required for a burst access. 

17. The computer system of Claim 15, wherein the 
15 next level cache memory comprises a plurality of DRAM 

arrays operating in parallel. 

18. The computer system of Claim 1, wherein the 
next level cache memory further comprises a refresh 

2 0 counter which stores the address of a DRAM array which 
is currently being refreshed. 

19. The computer system of Claim 18, wherein the 
next level cache memory further comprises a refresh 

2 5 collision comparator which compares the address of the 
DRAM array currently being refreshed with an address of 
a DRAM array requested by the CPU. 



20. The computer system of Claim 19, wherein the 
30 next level cache memory further comprises a CPU access 
delay circuit coupled to the refresh collision 
comparator, wherein the CPU access delay circuit 
transmits a ready handshake signal to the bus when the 
current refresh operation is completed. 
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21. The computer system of Claim 1, wherein the 
next level cache memory further comprises: 

a data buffer having an input port and an 
output port, the output port having a first width 
5 which enables the output port to simultaneously 

transfer a first number of data values, the input 
port having a second width which enables the input 
port to simultaneously transfer a second number of 
data values, the first number being greater than 
10 the second number, wherein the input port is 

coupled to the bus, and the dynamic random access 
memory (DRAM) array is coupled to the output port 
of the data buffer. 



15 22. The computer system of Claim 21, wherein the 

data buffer is a multi-depth data buffer capable of 
storing at least one set of write burst data. 

23. The computer system of Claim 21, wherein the 
20 input port and the output port are driven by clock 

signals of the same frequency. 

24. The computer system of Claim 1, wherein the 
next level cache memory further comprises: 

25 a data buffer having an input port and an 

output port, the input port being a parallel port 
having a first width which enables the input port 
to simultaneously transfer a first number of data 
values, the output port being a parallel port 

3 0 having a second width which enables the output 

port to simultaneously transfer a second number of 
data values, the first number being greater than 
the second number, wherein the output port is 
coupled to the bus, and the dynamic random access 

35 memory (DRAM) array is coupled to the input port 

of the data buffer. 
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25. The next level each memory of Claim 2 4 , 
wherein the data buffer is a multi-depth data buffer 
capable of storing at least one set of read burst data. 

5 26. The next level cache memory of Claim 24, 

wherein the input port and the output port are driven 
by clock signals of the same frequency. 

27. The computer system of Claim 1, wherein the 
10 next level cache memory further comprises a plurality 

of terminals coupling the DRAM array to the bus, 
wherein a first terminal is connected to receive a 
first signal from the CPU indicating the start of a new 
transaction and a second terminal is connected to 
15 receive a second signal from the CPU which identifies 
the new transaction as a read transaction or a write 
transaction. 

28. The next level cache memory of Claim 27 , 

20 wherein the first signal is generated by the CPU is an 
address strobe input signal compatible with a pipelined 
burst static random access memory (PBSRAM) protocol. 

29. The computer system of Claim 1, further 

25 comprising a system controller, wherein the next level 
cache memory further comprises a plurality of terminals 
coupling the DRAM array to the bus, wherein a first 
terminal is connected to receive a first signal from 
the system controller indicating the start of a new 

30 transaction and second and third terminals are 
connected to receive second and third signals, 
respectively, from the system controller, wherein the 
second and third signals identify the new transaction 
as a read transaction or a write transaction. 
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30. The comput r system of Claim 29, wherein the 
first signal is an address strobe input signal, the 
second signal is a byte write enable input signal and 
the third signal is a global write input signal, 

5 wherein the first, second and third signals are 

compatible with a pipelined burst static random access 
memory (PBSRAM) protocol. 

31. The computer system of Claim 1, further 
10 comprising a system controller coupled to the next 

level cache memory, wherein the dynamic random access 
memory (DRAM) array has a first terminal coupled to the 
system controller, wherein at least one signal is 
provided to the first terminal to implement a protocol 
15 to manage refresh operations within the DRAM array. 

32. The computer system of Claim 31, further 
comprising means in the system controller for sampling 
the signal at the beginning of read and write 

20 transactions of the DRAM array. 

33. The computer system of Claim 31, wherein the 
DRAM array further comprises: 

a master DRAM which includes the first 

25 terminal; and 

one or more slave DRAMs, each having a 
terminal coupled to the first terminal of the 
master DRAM. 



30 



34. The computer system of Claim 33, wherein each 
slave DRAM further comprises an input circuit which 
monitors the state of the signal to control the refresh 
and internal operations of the corresponding slave 
DRAM. 



35 
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35. The computer system of Claim 31, wherein the 
signal has at least one edge or state driven by the 
DRAM array which indicates a pending refresh or 
internal operation. 

36. The computer system of Claim 35, wherein the 
signal further has at least one edge or state driven by 
the system controller which indicates allowance of the 
pending refresh or internal operation. 

37. The computer system of Claim 35, wherein the 
system controller provides a second signal which has at 
least one edge or state which indicates allowance of 
said refresh or internal operation. 



38. A method of using a DRAM array as a next 
level cache memory from a bus of a computer system, the 
method comprising the steps of: 

operating the bus in synchronization with a 
first clock signal at a first clock frequency; 

generating a second clock signal in response 
to the first clock signal; and 

reading data from and writing data to the 
DRAM array in synchronization with the second 
25 clock signal. 

39. The method of Claim 38, wherein the second 
clock signal is of a higher frequency than the first 
clock frequency. 



40. The method of Claim 38, further comprising 
the steps of: 

transmitting signals on the bus at either the 
rising edges or the falling edges of the first 
35 clock signal; and 
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performing read operations within the DRAM 
array at both the rising and falling edges of the 
second clock signal. 

5 41. The method of Claim 40, wherein the frequency 

of the second clock signal is the same as the first 
clock frequency. 

42. The method of Claim 38, further comprising 

10 the steps of : 

transmitting signals on the bus at either the 
rising edges or the falling edges of the first 
clock signal; and 

performing write operations within the DRAM 
15 array at both the rising and falling edges of the 

second clock signal. 

43. The method of Claim 42, wherein the frequency 
of the second clock signal is the same as the first 

20 clock frequency. 

44. The method of Claim 38, further comprising 
the steps of : 

performing a row access operation in the DRAM 
25 array in a self-timed manner asynchronous with 

respect to the second clock signal; 

performing a column decode operation in the 
DRAM array in a self-timed manner asynchronous 
with respect to the second clock signal; and 
30 performing a predetermined sequence of column 

select operations in the DRAM array, wherein the 
column select operations are synchronous with 
respect to the second clock signal. 

35 45. The method of Claim 38, further comprising 

the steps of: 
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asynchronously reading a plurality of data 
values from the DRAM array to a plurality of sense 
amplifiers; 

synchronously writing the data values from 
5 the DRAM array to a buffer memory using the second 

clock signal to clock the data values into the 
buffer memory; 

synchronously reading the data values from 
the buffer memory to the bus using the first clock 
10 signal to clock the data values out of the buffer 

memory; and 

precharging a plurality of DRAM cells in the 
DRAM array during a time that the data values are 
synchronously read from the buffer memory to the 
15 bus. 

46. The method of Claim 45, wherein the step of 
precharging the DRAM array is completed before the step 
of reading the data values from the buffer memory is 

20 completed . 

47. The method of Claim 38, further comprising 
the steps of: 

synchronously writing a first set of data 
25 values into a write buffer using the first clock 

signal to clock the first set of data values into 
the write buffer; 

synchronously writing a second set of data 
values into the write buffer using the first clock 
30 signal to clock the second set of data values into 

the write buffer; 

synchronously writing the first set of data 
values to the DRAM array during a time the second 
set of data values are written into the write 
* 5 buffer, wherein the second clock signal is used to 
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clock the first set of data values into the DRAM 
array; and 

precharging a plurality of DRAM cells in the 
DRAM array during a time the second set of data 
5 values are written into the write buffer. 

48. The method of Claim 47, wherein the step of 
precharging the DRAM array is completed before the step 
of writing the second set of data values into the write 

10 buffer is completed. 

49. The method of Claim 47, wherein a data 
amplifier circuit and a sense amplifier circuit are 
coupled between the write buffer and the DRAM array and 

15 the step of synchronously writing the first set of data 
values to the DRAM array further comprises the steps 
of: 

disconnecting the data amplifier circuit from 
the sense amplifier circuit; 
20 writing the first set of data values from the 

write buffer to the data amplifier circuit at the 
same time as performing a row access operation in 
the DRAM array; and 

connecting the data amplifier circuit to the 
25 sense amplifier circuit, thereby causing the first 

set of data values to be provided to the DRAM 
array through the sense amplifier circuit. 

50. The method of Claim 38, further comprising 

30 the steps of: 

reading a plurality of data values from the 
DRAM array to a plurality of sense amplifiers; 

closing a plurality of column selector 
switches coupled between the sense amplifiers and 
35 a plurality of data amplifiers, thereby connecting 

the sense amplifiers to the data amplifiers; 
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opening the column selector switches, thereby 
disconnecting the sense amplifiers from the data 
amplifiers; and 

reading the data values out of the data 
5 amplifiers and precharging a plurality of DRAM 

cells in the DRAM array at the same time. 

51. The method of Claim 50, wherein the step of 
reading the data values out of the data amplifiers 
10 further comprises the steps of: 

writing the data values from the data 
amplifiers to a read buffer using the second clock 
signal to clock the data values into the read 
buffer; and 

15 reading the data values from the read buffer 

to the bus using the first clock signal to clock 
the data values onto the bus. 



52. The method of Claim 51, wherein the step of 
20 precharging the DRAM array is completed before the step 

of reading the data values from the read buffer to the 
bus is completed. 

53. The method of Claim 38, wherein an input port 
25 of a data buffer memory and an output port of the data 

buffer memory are coupled to the bus, and further 
comprising the steps of; 

outputting data from the output port of the 
data buffer memory in response to the first clock 
30 signal; 

inputting data to the input port of the data 
buffer memory in response to the second clock 
signal; 

transmitting a plurality of data values from 
3 5 the DRAM array to the input port of the data 
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buffer memory during each cycle of the second 
clock signal; and 

transmitting a single data value from the 
output port of the data buffer memory to the bus 
5 during each cycle of the first clock signal. 

54. The method of Claim 53, wherein the first and 
second clock signals each have the same frequency. 

10 55. The method of Claim 53, further comprising 

the step of precharging the DRAM array during the step 
of transmitting a single data value. 

56. The method of Claim 55, wherein the DRAM 
15 array includes an array of DRAM memory cells, a sense 
amplifier circuit coupled to the array of DRAM memory 
cells, column selector circuit coupled to the sense 
amplifier circuit, and a data amplifier circuit coupled 
to the column selector circuit, the method further 
20 comprising the steps of: 

reading the plurality of data values from the 
array of DRAM memory cells to the data amplifier 
circuit through the sense amplifier circuit and 
the column selector circuit; 
25 disconnecting the data amplifier circuit from 

the sense amplifier circuit using the column 
selector circuit; and then 

precharging the DRAM array at the same time 
that the plurality of data values are transmitted 
30 to the input port of the data buffer memory. 

57. The method of Claim 38, wherein an input port 
of a data buffer memory is connected to the bus and an 
output port of the data buffer memory is connected to 
35 the DRAM array, the method further comprising the steps 
of: 
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inputting data to the input port of the data 
buffer memory in response to the first clock 
signal; 

outputting data to the output port of the 
5 data buffer memory in response to the second clock 

signal; 

transmitting a single data value from the bus 
to the input port of the data buffer memory during 
each cycle of the first clock signal; and 
10 transmitting a plurality of data values from 

the output port of the data buffer memory to the 
DRAM array during each cycle of the second clock 
signal. 

15 58 - The method of Claim 57, wherein the first and 

second clock signals each have the same frequency. 

59. The method of Claim 57 , further comprising 
the step of precharging the DRAM array during the step 

20 of transmitting a single data value. 

60. A method of Claim 38, wherein a system 
controller is coupled to a first terminal of the DRAM 
array, the method further comprising the steps of: 

25 transmitting a first signal from the CPU to 

the DRAM array to initiate a transaction requested 
by the CPU; 

transmitting a second signal from the system 
controller to the DRAM array to initiate a 
30 transaction requested by the system controller; 

transmitting a third signal to the first 
terminal of the DRAM array to implement a protocol 
to manage refresh and internal operations within 
the DRAM array. 

35 
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61. The method of Claim 60 , wherein the third 
signal has a first state and a second state , wherein a 
transition from the first state to the second state 
indicates a pending refresh or internal operation, and 

5 a transition from the second state to the first state 
indicates a completed refresh or internal operation, 
the method further comprising the steps of: 

monitoring the first and second signals to 
determine when the first signal or the second 
10 signal is asserted; 

determining the state of the third signal 
when either the first signal or the second signal 
is asserted; and 

performing transactions requested by the CPU 
15 and by the controller to completion if the third 

signal is in the first state when either the first 
signal or the second signal is asserted. 

62. The method of Claim 61, further comprising 
20 the steps of: 

performing a transaction requested by the 
controller to completion and then performing a 
refresh or internal operation if the third signal 
is in the second state and has been in the second 

25 state for less than a first predetermined time 

period when the first signal is asserted; and 

performing a refresh or internal operation 
and delaying the CPU-requested transaction if the 
third signal has been in the second state for at 

30 least the first predetermined time period when the 

first signal is asserted. 

63. The method of Claim 62, further comprising 
the step of performing a refresh or internal operation 

35 and delaying a transaction requested by the controller 
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if the third signal is in the second state when the 
second signal is asserted. 

64. The method of Claim 62, further comprising 
5 the steps of: 

performing a transaction requested by the 
controller to completion and then performing a 
refresh or internal operation if the third signal 
is in the second state and has been in the second 
10 state for less than a first predetermined time 

period when the second signal is asserted; and 

performing a refresh or internal operation 
and delaying the controller-requested transaction 
if the third signal has been in the second state 
15 for at least the first predetermined time period 

when the second signal is asserted. 

65. The method of Claim 61 , further comprising 
the steps of : 

20 performing a refresh or internal operation 

and delaying a transaction requested by the CPU if 
the third signal is in the second state when the 
first signal is asserted. 

25 66. The method of Claim 60 , wherein the DRAM 

array comprises a master DRAM device and at least one 
slave DRAM device, the method further comprising the 
steps of : 

providing the first, second and third signals 
30 to the master DRAM device and the slave DRAM 

devices ; 

transmitting the third signal between the 
master DRAM device and the system controller; 

sampling the third signal with the slave DRAM 
35 devices; and 
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controlling the refresh and internal 
operations of the slave DRAM devices in response 
to the third signal. 

5 67. The method of Claim 60, further comprising 

the step of driving the third signal with the system 
controller. 

68. The method of Claim 60, further comprising 
10 the step of driving the third signal with the DRAM 

array to indicate pending refresh or internal 
operations . 

69. The method of Claim 68, further comprising 
15 the step of driving the third signal with the system 

controller to indicate allowance of refresh or internal 
operations. 

70. The method of Claim 68, further comprising 
20 the step of driving a fourth signal with the system 

controller to indicate allowance of refresh or internal 
operations. 
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